SpeeDO: Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network
نویسندگان
چکیده
Convolutional Neural Networks (CNNs) have achieved breakthrough results on many machine learning tasks. However, training CNNs is computationally intensive. When the size of training data is large and the depth of CNNs is high, as typically required for attaining high classification accuracy, training a model can take days and even weeks. In this work, we propose SpeeDO (for Open DEEP learning System in backward order). SpeeDO uses off-the-shelf hardwares to speed up CNN training, aiming to achieve two goals. First, parallelizing stochastic gradient descent (SGD) on a GPU cluster with off-the-shelf hardwares improves deployability and cost effectiveness. Second, such a widely deployable hardware configuration can serve as a benchmark on which software algorithmic approaches can be evaluated and improved. Our experiments compared representative SGD parallel schemes and identified bottlenecks where overhead can be further reduced.
منابع مشابه
T He O Uter P Roduct S Tructure of N Eural N Et - Work D Erivatives
Training methods for neural networks are primarily variants on stochastic gradient descent. Techniques that use (approximate) second-order information are rarely used because of the computational cost and noise associated with those approaches in deep learning contexts. We can show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutiona...
متن کاملDistributed asynchronous optimization of convolutional neural networks
Recently, deep Convolutional Neural Networks have been shown to outperform Deep Neural Networks for acoustic modelling, producing state-of-the-art accuracy in speech recognition tasks. Convolutional models provide increased model robustness through the usage of pooling invariance and weight sharing across spectrum and time. However, training convolutional models is a very computationally expens...
متن کاملGoSGD: Distributed Optimization for Deep Learning with Gossip Exchange
We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way of sharing information between different threads based on gossip algorithms that show good consensus co...
متن کاملOvercoming Challenges in Fixed Point Training of Deep Convolutional Networks
It is known that training deep neural networks, in particular, deep convolutional networks, with aggressively reduced numerical precision is challenging. The stochastic gradient descent algorithm becomes unstable in the presence of noisy gradient updates resulting from arithmetic with limited numeric precision. One of the wellaccepted solutions facilitating the training of low precision fixed p...
متن کاملAccelerating deep neural network training with inconsistent stochastic gradient descent
Stochastic Gradient Descent (SGD) updates Convolutional Neural Network (CNN) with a noisy gradient computed from a random batch, and each batch evenly updates the network once in an epoch. This model applies the same training effort to each batch, but it overlooks the fact that the gradient variance, induced by Sampling Bias and Intrinsic Image Difference, renders different training dynamics on...
متن کامل